NCBI/GenBank BLAST Output XML Parser Tool

نویسندگان

  • David Ream
  • Andor J Kiss
چکیده

We describe a small freely available computer script to extract ‘real world’ sequence descriptions from the BLASTX results from sequences generated by the stand-alone ncbi­blast­2.2.26 suite of tools (available from NCBI/GenBank). Our Python (2.7) script is intended to make name extraction feasible for thousands, of hundreds of thousands, of sequences such as that generated by BLASTX analysis of RNA-Seq (transcriptome) obtained cDNAs from next generation sequencing (NGS) experiments. This script facilitates the interrogation of the large BLASTX output of a transcriptome experiment by familiar tools such as Microsoft Excel, or LibreOffice Calc. The script was written and tested on the Linux operating system (Ubuntu 12.04 LTS), but should work in any Python 2.7 compatible environment. We include some example files and help documentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zerg: A Very Fast BLAST Parser Library

SUMMARY Zerg is a library of sub-routines that parses the output from all NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn and Tblastx) and returns the attributes of a BLAST report to the user. It is optimized for speed, being especially useful for large-scale genomic analysis. Benchmark tests show that Zerg is over two orders of magnitude faster than some widely used BLAST parsers. AVAIL...

متن کامل

Database resources of the National Center for Biotechnology Information

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, B...

متن کامل

atabase resources of the National Center for Biotechnology Information: update

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic...

متن کامل

Database resources of the National Center for Biotechnology

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Elec...

متن کامل

NOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results

UNLABELLED NOBLAST (New Options for BLAST) is an open source program that provides a new user-friendly tabular output format for various NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn, Tblastx, Mega BLAST and Psi BLAST) without any use of a parser and provides E-value correction in case of use of segmented BLAST database. JAMBLAST using the NOBLAST output allows the user to manage, view a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013